00:00
2026-06-22
fergusfinn.com
large-language-models
Adaptive speculative decoding: picking draft lengths at runtime
Researchers have developed adaptive speculative decoding, a method that dynamically selects draft lengths at runtime to optimize token generation efficiency in large language models. The approach addr…